Classifying ASR Transcriptions According to Arabic Dialect

نویسندگان

  • Abualsoud Hanani
  • Aziz Qaroush
  • Stephen Taylor
چکیده

We describe several systems for identifying short samples of Arabic dialects, which were prepared for the shared task of the 2016 DSL Workshop (Malmasi et al., 2016). Our best system, an SVM using character tri-gram features, achieved an accuracy on the test data for the task of 0.4279, compared to a baseline of 0.20 for chance guesses or 0.2279 if we had always chosen the same most frequent class in the test set. This compares with the results of the team with the best weighted F1 score, which was an accuracy of 0.5117. The team entries seem to fall into cohorts, with the all the teams in a cohort within a standard-deviation of each other, and our three entries are in the third cohort, which is about seven standard deviations from the top.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Dialect Identification Using iVectors and ASR Transcripts

This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017. The goal of the task is to evaluate computational models to identify the dialect of Arabic utterances using both audio and text transcriptions. The ADI shared task dataset included Modern Standard Arabic (MSA) and four Arabic dialects: Egypti...

متن کامل

Pronunciation Modelling and Lexical Adaptation in Mid-size Vocabulary Asr

A computational-phonological method is presented to automatically adapt the phone transcriptions in a lexicon to improve ASR performance in a number of mid-size recognition tasks. The lexical adaptation approach is based on supervised phoneme loops using cd-HMM segments to find alternatives for the transcriptions, and can be considered as a counterpart of the K-means algorithm but on symbolic l...

متن کامل

Collecting Data for Automatic Speech Recognition Systems in Dialectal Arabic Using Games with a Purpose

Building Automatic Speech Recognition (ASR) systems for spoken languages usually suffer from the problem of limited available transcriptions. Automatic Speech Recognition (ASR) systems require large speech corpora that contain speech and their corresponding transcriptions for training acoustic models. In this paper, we target the Egyptian dialectal Arabic. As other spoken languages, it is mainl...

متن کامل

Arabic Dialectical Speech Recognition in Mobile Communication Services

We present in this chapter a practical approach in building Arabic automatic speech recognition (ASR) system for mobile telecommunication service applications. We also present a procedure in conducting acoustic modelling adaptation to better take into account the pronunciation variation across the Arabic speaking countries. Modern Standard Arabic (MSA) is the common spoken and written language ...

متن کامل

Efficient Harvesting of Internet Audio for Resource-Scarce ASR

Spoken recordings that have been transcribed for human reading (e.g. as captions for audiovisual material, or to provide alternative modes of access to recordings) are widely available in many languages. Such recordings and transcriptions have proven to be a valuable source of ASR data in well-resourced languages, but have not been exploited to a significant extent in under-resourced languages ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016